Logical Layout Recovery: approach for graphic-based features

نویسندگان

  • Aysylu Gabdulkhakova
  • Tamir Hassan
  • Walter G. Kropatsch
چکیده

In contrast to the existing approaches for document analysis and understanding this paper represents a system that considers a logical role for graphic content in predominantly textual, born digital PDF documents. This work was inspired by the idea of using structural graphic objects in order to clarify the logical layout even of complex mostly graphic documents. Based on visual cognition, geometric features and spatial relations, the proposed statistical method distinguishes illustrative graphic objects from structural graphic objects. We performed evaluation on two document domains newspapers and technical manuals and found the results to be reliable. We propose using logical information about the graphic content to be a new step towards domain-independent document understanding systems.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Detection, Extraction and Representation of Tables

We are concerned with the extraction of tables from exchange format representations of very diverse composite documents. We put forward a flexible representation scheme for complex tables, based on a clear distinction between the physical layout of a table and its logical structure. Relying on this scheme, we develop a new method for the detection and the extraction of tables by an analysis of ...

متن کامل

Content features for logical document labeling

The use of content features extracted from recognized text is valuable in labeling logical elements in documents without rigid layout structure, like business letters. This paper discusses a model-based approach to combining content features with other geometrical and presentation features for logical labeling. Models are automatically initialized and adaptively improved using training samples....

متن کامل

Towards a Logical Foundation for Qualitative Decision Theory

The purpose of this abstract is to motivate and outline a logical theory of preference that, arose from some very concrete and practical considerations. For some time now [6], we have been working on a formal model for representing and processing (such a.s laying out) structured electronic documents. One of the sub-problems in our project w&s designing a declarative language in wMch graphic and...

متن کامل

Improved CHAID algorithm for document structure modelling

This paper proposes a technique for the logical labelling of document images. It makes use of a decision-tree based approach to learn and then recognise the logical elements of a page. A state-of-the-art OCR gives the physical features needed by the system. Each block of text is extracted during the layout analysis and raw physical features are collected and stored in the ALTO format. The data-...

متن کامل

Layout Understanding A Knowledge Based Approach

The layout of information is a complex task, and is usual-ly entrusted to a professional graphic designer. Research into automatic layout attempts to develop computational models of problem solving techniques used by graphic designers to solve layout problems. Graphic designers develop their problem solving skills through years of learning by following three approaches: i) learning by being tol...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012